8 research outputs found

    Scenic: A Language for Scenario Specification and Scene Generation

    Full text link
    We propose a new probabilistic programming language for the design and analysis of perception systems, especially those based on machine learning. Specifically, we consider the problems of training a perception system to handle rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs and sampling these to generate specialized training and test sets. More generally, such languages can be used for cyber-physical systems and robotics to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment is a "scene", a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing "scenarios" that are distributions over scenes. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.Comment: 41 pages, 36 figures. Full version of a PLDI 2019 paper (extending UC Berkeley EECS Department Tech Report No. UCB/EECS-2018-8

    Détection de Classes d'Objets et Estimation de leurs Poses à partir de Modèles 3D Synthétiques

    No full text
    This dissertation aims at extending object class detection and pose estimation tasks on single 2D images by a 3D model-based approach. The work describes learning, detection and estimation steps adapted to the use of synthetically rendered data with known 3D geometry. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. By using existing CAD models and rendering techniques from the domain of computer graphics which are parameterized to reproduce some variations commonly found in real images, we propose instead to build 3D representations of object classes which allow to handle viewpoint changes and intra-class variability. These 3D representations are derived in two different ways : either as an unsupervised filtering process of pose and class discriminant local features on purely synthetic training data, or as a part model which discriminatively learns the object class appearance from an annotated database of real images and builds a generative representation of 3D geometry from a database of synthetic CAD models. During detection, we introduce a 3D voting scheme which reinforces geometric coherence by means of a robust pose estimation, and we propose an alternative probabilistic pose estimation method which evaluates the likelihood of groups of 2D part detections with respect to a full 3D geometry. Both detection methods yield approximate 3D bounding boxes in addition to 2D localizations ; these initializations are subsequently improved by a registration scheme aligning arbitrary 3D models to optical and Synthetic Aperture Radar (SAR) images in order to disambiguate and prune 2D detections and to handle occlusions. The work is evaluated on several standard benchmark datasets and it is shown to achieve state-of-the-art performance for 2D detection in addition to providing 3D pose estimations from single images.Cette thèse porte sur la détection de classes d'objets et l'estimation de leur poses à partir d'une seule image en utilisant des étapes d'apprentissage, de détection et d'estimation adaptées aux données synthétiques. Nous proposons de créer des représentations en 3D de classes d'objets permettant de gérer simultanément des points de vue différents et la variabilité intra-classe. Deux méthodes différentes sont proposées : La première utilise des données d'entraînement purement synthétiques alors que la seconde approche est basée sur un modèle de parties combinant des images d'entraînement réelles avec des données géométriques synthétiques. Pour l'entraînement de la méthode purement synthétique, nous proposons une procédure non-supervisée de filtrage de descripteurs locaux afin de rendre les descripteurs discriminatifs pour leur pose et leur classe d'objet. Dans le cadre du modèle de parties, l'apparence d'une classe d'objets est apprise de manière discriminative à partir d'une base de données annotée et la géométrie en 3D est apprise de manière générative à partir d'une base de modèles CAO. Pendant la détection, nous introduisons d'abord une méthode de vote en 3D qui renforce la cohérence géométrique en se servant d'une estimation robuste de la pose. Ensuite, nous décrivons une deuxième méthode d'estimation de pose qui permet d'évaluer la probabilité de constellations de parties détectées en 2D en utilisant une géométrie 3D entière. Les estimations approximatives sont ensuite améliorées en se servant d'un alignement de modèles 3D CAO avec des images en 2D ce qui permet de résoudre des ambiguïtés et de gérer des occultations

    Self-calibrating 3D context for retrieving people with luggage

    Get PDF

    Multi-View Object Class Detection with a 3D Geometric Model

    Get PDF
    This paper presents a new approach for multi-view object class detection. Appearance and geometry are treated as separate learning tasks with different training data. Our approach uses a part model which discriminatively learns the object appearance with spatial pyramids from a database of real images, and encodes the 3D geometry of the object class with a generative representation built from a database of synthetic models. The geometric information is linked to the 2D training data and allows to perform an approximate 3D pose estimation for generic object classes. The pose estimation provides an efficient method to evaluate the likelihood of groups of 2D part detections with respect to a full 3D geometry model in order to disambiguate and prune 2D detections and to handle occlusions. In contrast to other methods, neither tedious manual part annotation of training images nor explicit appearance matching between synthetic and real training data is required, which results in high geometric fidelity and in increased flexibility. On the 3D Object Category datasets CAR and BICYCLE [15], the current state-of-the-art benchmark for 3D object detection, our approach outperforms previously published results for viewpoint estimation

    Learning to Represent Multiple Object Classes on a Continuous Viewsphere

    Get PDF
    Existing work on multi-class object detection usually does not cover the entire viewsphere of each class in a continuous way: object classes from different viewpoints are either discretized into a few sparse viewpoints [12],or treated as entirely separate object classes [20]. In the present work, we describe an approach to multi-class object detection which allows sharing parts between different viewpoints and several classes while also learning a dense representation for the entire viewsphere of each class. We describe three learning approaches with different part sharing strategies in order to reduce the computational complexity of the learnt representation. Our approach uses synthetic training data to achieve a dense viewsphere coverage which also allows to perform object class and 3D pose estimation on single images

    Interactive

    No full text
    Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.

    Viewpoint-independent object class detection using 3d feature maps

    Get PDF
    This paper presents a 3D approach to multi-view object class detection. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. We propose instead to build 3D representations of object classes which allow to handle viewpoint changes and intra-class variability. Our approach extracts a set of pose and class discriminant features from synthetic 3D object models using a filtering procedure, evaluates their suitability for matching to real image data and represents them by their appearance and 3D position. We term these representations 3D Feature Maps. For recognizing an object class in an image we match the synthetic descriptors to the real ones in a 3D voting scheme. Geometric coherence is reinforced by means of a robust pose estimation which yields a 3D bounding box in addition to the 2D localization. The precision of the 3D pose estimation is evaluated on a set of images of a calibrated scene. The 2D localization is evaluated on the PASCAL 2006 dataset for motorbikes and cars, showing that its performance can compete with state-of-the-art 2D object detectors. 1
    corecore